Improved I-vector-based Speaker Recognition for Utterances with Speaker Generated Non-speech sounds
نویسندگان
چکیده
Conversational speech not only contains several variants of neutral speech but is also prominently interlaced with several speaker generated non-speech sounds such as laughter and breath. A robust speaker recognition system should be capable of recognizing a speaker irrespective of these variations in his speech. An understanding of whether the speaker-specific information represented by these variations is similar or not helps build a good speaker recognition system. In this paper, speaker variations captured by neutral speech of a speaker is analyzed by considering speech-laugh (a variant of neutral speech) and laughter (non-speech) sounds of the speaker. We study an ivector-based speaker recognition system trained only on neutral speech and evaluate its performance on speech-laugh and laughter. Further, we analyze the effect of including laughter sounds during training of an i-vector-based speaker recognition system. Our experimental results show that the inclusion of laughter sounds during training seem to provide complementary speakerspecific information which results in an overall improved performance of the speaker recognition system, especially on the utterances with speech-laugh segments.
منابع مشابه
Speaker Adaptation in Continuous Speech Recognition Using MLLR-Based MAP Estimation
A variety of methods are used for speaker adaptation in speech recognition. In some techniques, such as MAP estimation, only the models with available training data are updated. Hence, large amounts of training data are required in order to have significant recognition improvements. In some others, such as MLLR, where several general transformations are applied to model clusters, the results ar...
متن کاملSpeaker Adaptation in Continuous Speech Recognition Using MLLR-Based MAP Estimation
A variety of methods are used for speaker adaptation in speech recognition. In some techniques, such as MAP estimation, only the models with available training data are updated. Hence, large amounts of training data are required in order to have significant recognition improvements. In some others, such as MLLR, where several general transformations are applied to model clusters, the results ar...
متن کاملi-vector Based Speaker Recognition on Short Utterances
Robust speaker verification on short utterances remains a key consideration when deploying automatic speaker recognition, as many real world applications often have access to only limited duration speech data. This paper explores how the recent technologies focused around total variability modeling behave when training and testing utterance lengths are reduced. Results are presented which provi...
متن کاملDenoising autoencoder-based speaker feature restoration for utterances of short duration
This paper describes a speaker feature restoration method for improving text-independent speaker recognition with short utterances. The method employs a denoising autoencoder (DAE) to compensate speaker features of a short utterance which contains limited phonetic information. It first estimates phonetic distribution in the utterance as posteriors based on speech models and then transforms an i...
متن کاملAn Unsupervised Speaker Clustering Technique based on SOM and I-vectors for Speech Recognition Systems
In this paper, we introduce an enhancement for speech recognition systems using an unsupervised speaker clustering technique. The proposed technique is mainly based on I-vectors and Self-Organizing Map Neural Network (SOM). The input to the proposed algorithm is a set of speech utterances. For each utterance, we extract 100-dimensional I-vector and then SOM is used to group the utterances to di...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- CoRR
دوره abs/1705.09289 شماره
صفحات -
تاریخ انتشار 2017